Two space saving tricks for linear time LCP computation

نویسنده

  • Giovanni Manzini
چکیده

In this paper we consider the linear time algorithm of Kasai et al. [10] for the computation of the LCP array given the text and the suffix array. We show that this algorithm can be implemented without any auxiliary array in addition to the ones required for the input (the text and the suffix array) and the output (the LCP array). Thus, for a text of length n, we reduce the space occupancy of this algorithm from 13n bytes to 9n bytes. We also consider the problem of computing the LCP array “overwriting” the suffix array. For this problem we propose an algorithm whose space occupancy depends on the regularity of the text. Experiments show that for linguistic texts our algorithm uses roughly 7n bytes. Our algorithm makes use of the Burrows-Wheeler Transform even if it does not represent any data in compressed form. To our knowledge this is the first application of the BurrowsWheeler Transform outside the domain of data compression.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Two Space Saving Tricks for Linear Time LCP Array Computation

In this paper we consider the linear time algorithm of Kasai et al. [6] for the computation of the Longest Common Prefix (LCP) array given the text and the suffix array. We show that this algorithm can be implemented without any auxiliary array in addition to the ones required for the input (the text and the suffix array) and the output (the LCP array). Thus, for a text of length n, we reduce t...

متن کامل

Space-Time Tradeoffs for Longest-Common-Prefix Array Computation

The suffix array, a space efficient alternative to the suffix tree, is an important data structure for string processing, enabling efficient and often optimal algorithms for pattern matching, data compression, repeat finding and many problems arising in computational biology. An essential augmentation to the suffix array for many of these tasks is the Longest Common Prefix (LCP) array. In parti...

متن کامل

Range LCP

In this paper, we define the Range LCP problem as follows. Preprocess a string S, of length n, to enable efficient solutions of the following query: Given [i, j], 0 < i ≤ j ≤ n, compute max`,k∈{i,...,j} LCP (S`, Sk), where LCP (S`, Sk) is the length of the longest common prefix of the suffixes of S starting at locations ` and k. This is a natural generalization of the classical LCP problem. Sur...

متن کامل

Longest-Common-Prefix Computation in Burrows-Wheeler Transformed Text

In this paper we consider the existing algorithm for computation of the Longest-Common-Prefix (LCP) array given a text string and its suffix array and adapt it to work on Burrows-Wheeler Transform (BWT) text. We did this by a combination of pre-processing steps and improvement based on existing algorithm. Three LCP array computation algorithms were proposed, namely LCPB-A, LCPB-B and LCPB-C tha...

متن کامل

LPF Computation Revisited

We present efficient algorithms for storing past segments of a text. They are computed using two previously computed read-only arrays (SUF and LCP) composing the Suffix Array of the text. They compute the maximal length of the previous factor (subword) occurring at each position of the text in a table called LPF. This notion is central both in many conservative text compression techniques and i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004